Tables and Figures

Figure 1. Still frames from videos shown to participants in Experiments 1-5, including stimuli from habituation (A) and test (B). In each video, a person reached for and caused a change in an object (H1-H3, T1-T2), or picked up the object (H4-H5, T3-T4), over a barrier (H1- H2, H4-H5) or over empty space (H2, T1-T4). The person either acted on the object by contacting it (H1-H2, H4-H5, T1, T3-T4) or produced the same effect from a distance of 50 pixels, after a 0.5s delay (H3, T2), and either performed these actions while wearing a mitten (H1-H4, T1-T3) or with a bare hand (H5, T4) During test (B), the person either reached directly for the object on a novel but efficient trajectory (left panels), or in a curvilinear fashion on the familiar but inefficient trajectory (right panels).

Figure 1. Still frames from videos shown to participants in Experiments 1-5, including stimuli from habituation (A) and test (B). In each video, a person reached for and caused a change in an object (H1-H3, T1-T2), or picked up the object (H4-H5, T3-T4), over a barrier (H1- H2, H4-H5) or over empty space (H2, T1-T4). The person either acted on the object by contacting it (H1-H2, H4-H5, T1, T3-T4) or produced the same effect from a distance of 50 pixels, after a 0.5s delay (H3, T2), and either performed these actions while wearing a mitten (H1-H4, T1-T3) or with a bare hand (H5, T4) During test (B), the person either reached directly for the object on a novel but efficient trajectory (left panels), or in a curvilinear fashion on the familiar but inefficient trajectory (right panels).


Figure 2. Looking time in seconds towards the efficient versus inefficient reach (bottom), and proportion looking towards the inefficient reach (top) at test across Experiments 1-5 (N=152). Labels above each panel list the experiment name (Exp. 1-5), whether actions during habituation were constrained or unconstrained by a barrier, goal (state.change or pick.up), whether these actions involved contact with the object, whether the actor wore a mitten, and video displays listed in Figure 1. Error bars around means indicate within-subjects 95% confidence intervals (bottom) and bootstrapped 95% confidence intervals (top). Individual points (top) or pairs of connected points (bottom) indicate data from a single participant. Horizontal bars within boxes indicate medians, and boxes indicate the middle 2 quartiles of data. Violin plots (top) indicate distribution of data, area scaled proportionally to the number of observations.


Results (Main Text)

Infants’ analysis of causal vs non-causal actions

Experiment 1

In Experiment 1, infants (N=40; 20 per condition; Mean age=108 days; range=91-122, 23 female) were habituated to and tested on video clips where a person reached for and caused an object to illuminate by touching it (Figure 1) while we measured their looking times. First, we randomly assigned infants to one of two habituation conditions: In the constrained condition, infants watched the person reach over a barrier that prevented direct access to the object (Figure 1A, H1), and in the unconstrained condition, infants watched the person perform the same reaches with the barrier behind the goal object, out of the actor’s way (H2) (an action that appears to adults and older infants to be inefficient (36)). After infants habituated to these events (i.e. their attention declined by 50%), or after 12 trials, whichever came first, we measured their attention to alternating test events (Figure 1B, T1). During test, the person either reached on the same curvilinear path towards the object (a familiar but newly inefficient action) or on a direct path (a novel but newly efficient action). For methodological details, see SM. Across all experiments, we calculated the average looking time towards the efficient versus inefficient reach over 3 pairs of test events. Infant looking times are often log-normally distributed (37), including in this dataset (see Figure S3) and thus were log-transformed (main results) or transformed to proportions (meta-analysis, see SM) prior to analysis. If prereaching infants view reaching actions as constrained by barriers and expect them to be efficient, then those in the constrained condition should look longer, at test, when the actor reached on the superficially familiar but inefficient curvilinear trajectory. Figure 1. Still frames from videos shown to participants in Experiments 1-5, including stimuli from habituation (A) and test (B). In each video, a person reached for and caused a change in an object (H1-H3, T1-T2), or picked up the object (H4-H5, T3-T4), over a barrier (H1- H2, H4-H5) or over empty space (H2, T1-T4). The person either acted on the object by contacting it (H1-H2, H4-H5, T1, T3-T4) or produced the same effect from a distance of 50 pixels, after a 0.5s delay (H3, T2), and either performed these actions while wearing a mitten (H1-H4, T1-T3) or with a bare hand (H5, T4) During test (B), the person either reached directly for the object on a novel but efficient trajectory (left panels), or in a curvilinear fashion on the familiar but inefficient trajectory (right panels). All videos are open source and available at https://osf.io/fe4wj/.

Infants responded differently to the test events across the two habituation conditions ([0.273,0.732], ß=0.781, B=0.502, SE=0.114 p<.001, two-tailed, 2 participants excluded on the basis of Cook’s Distance (38), mixed effects model with habituation condition and test event as a fixed interaction, participant identity as a random intercept). When the actor’s reaches were initially constrained by a barrier (H1), infants looked longer, at test, at the inefficient action than the efficient action (Meanineff=15.448s, Meaneff=12.368s, [0.396,1.139], ß=1.194, B=0.768, SE=0.185 p<.001, two-tailed). Critically, this looking preference cannot be attributed to low-level preferences for the curvilinear reach, because infants in the unconstrained condition (H2) showed a preference for the efficient over the inefficient action (Mineff=8.788s, Meff=10.104s, [-0.343,-0.017], ß=-0.28, B=-0.18, SE=0.081 p=0.032, two-tailed). See Figure 2. Experiment 1 therefore provides evidence that infants expect object-directed reaches to be efficient.

Experiment 2

In Experiment 2, pre-registered at https://osf.io/a5byn/, we tested whether this expectation depends on infants’ construal of the actor as a causal agent, who intends to change the state of the object on contact. We did this by introducing digital manipulations to the habituation and test events from Experiment 1 that abolish causal perception in older infants and adults (7–9, 39, 40). Infants (N=20; Mean age=107 days; range=93-121; 12 female) saw videos identical to those from the constrained condition of Experiment 1, except that person’s hand never contacted the object (stopping 50 pixels, or 2 cm above it), and the object changed state after a 0.5 second delay (H3, T2). In contrast to Experiment 1, infants looked equally at test trials showing the inefficient and efficient actions (Mineff=15.306s, Meff=16.38s, [-0.191,0.301], ß=0.096, B=0.055, SE=0.119 p=0.649, two-tailed, mixed effects model with test event as fixed effect and participants as a random intercept). Across Experiments 1 (H1, T1) and 2 (H3, T2), infants responded differently to the test events depending on whether the person acted on the object on contact ([-0.623,-0.003], ß=-0.547, B=-0.313, SE=0.154 p=0.049, two-tailed, mixed effects model with fixed interaction between causal intention and test event and participants as a random intercept). This finding suggests that infants expect others’ actions to be efficient if and only if these actions are guided by causal intentions.

Experiment 3

To evaluate this suggestion further, we conducted a direct replication of Experiments 1 and 2. In Experiment 3 pre-registered at https://osf.io/f2hvd/, we randomly assigned infants (N=52, 26 per condition; M=107 days; range=92-121; 21 female) to events that differed by causal intention alone. This design allowed us to compare infants’ expectations about efficiency to causal (H1, T1) vs non-causal (H3, H2) actions, under testing conditions where all researchers were were blind to condition as well as test events. We fully replicated the findings from Experiments 1-2: Infants again responded to the test events differently depending on whether the person’s hand contacted the object at the time of the object’s state change ([-0.815,-0.184], ß=-0.729, B=-0.5, SE=0.158 p=0.003, two-tailed, mixed effects model with fixed interaction of causality and test event and random intercept for participants). As in Experiment 1, infants looked longer at the inefficient than the efficient reach when the person appeared intentionally cause a change in the object, (Mineff=12.166s, Meff=7.791s, [0.211,0.66], ß=0.635, B=0.436, SE=0.112 p<.001, one-tailed), and as in Experiment 2, infants looked equally to the inefficient and efficient reaches when she did not appear to cause this outcome (Mineff=11.395s, Meff=12.888s, [-0.289,0.16], ß=-0.094, B=-0.064, SE=0.112 p=0.284, one-tailed). Although 3-month-old infants have limited experience acting as causal agents themselves, they understand that other people intend to cause a change in the world through their actions, and tend to do so efficiently.

Infants’ analysis of entrainment actions

Experiments 4 and 5

In Experiment 4, we habituated and tested infants (N=20; M=108 days; range=92-122, 11 female) using events in which a person reached for, picked up, and displaced an object while wearing a mitten (H4, T3), as in prior research (28). In Experiment 5, infants (N=20; M=108 days; range=93-120; 12 female) saw almost identical videos except that the person reached with a bare hand (H5, T4). If infants need first-person action experience to understand acts of picking up and displacing objects, then they should fail in both conditions. If they understand that hands cause changes in objects on contact, as Experiments 1-3 and computational studies (41) suggest, then they might succeed in these experiments, especially in the more familiar bare-handed condition. We found that infants looked longer at the inefficient than the efficient reach of the bare hand in Experiment 5 (Mineff=9.715s, Meff=8.036s, [0.008,0.331], ß=0.296, B=0.17, SE=0.08 p=0.02, one-tailed), but they looked equally to the inefficient and the efficient reach of the mittened hand in Experiment 4 (Mineff=18.029s, Meff=16.844s, [-0.083,0.232], ß=0.13, B=0.074, SE=0.078 p=0.172, one-tailed). However, infants’ expectations in Experiments 4 and 5 did not differ from each other ([-0.128,0.319], ß=0.167, B=0.095, SE=0.111 p=0.396, two-tailed, mixed effects model with fixed interaction between mitten and test event, random intercept for participants, excluding 3 influential participants on the basis of Cook’s Distance). Thus, Experiment 5 provides evidence that infants expect barehanded reaching and lifting of an object to be efficient, but differences in infants’ responses to gloved and ungloved hands are not clear.

Comparing state change and entrainment actions

To explore these effects further and compare them to past research (28), we performed a meta-analysis over the ten experiments (Experiments 1-5, and all experiments from Skerry et al. (28)) that used the present paradigm with 3-month-old infants (total N=264) (Fig. S1). As shown in past training studies (22, 28, 30, 42), action understanding was more robust after training, relative to no training ([0.027,0.069], ß=0.558, B=0.049, SE=0.011 p=0.003, two-tailed). Infants also held stronger expectations for efficient reaching when the actor simply touched an object and caused a change in its state, as in Experiments 1-3, than when she lifted and displaced the object, as in Experiments 4-5 and all experiments in Skerry et al. (28) ([0.02,0.053], ß=0.397, B=0.035, SE=0.01 p=0.02, two-tailed). Knowledge of the causal intentions and costs underlying reaching actions therefore arises without training but is more robust if actions are causally transparent or if infants receive action training. For full meta-analytic methods and results, see SM.

Reliability (Methods Section)

To assess reliability, 50% of test trials from participants across Experiments 1-5 (132 participants, 456 trials) were randomly selected and coded by additional researchers who were unaware of experimental condition, and test trial order. The intraclass correlation coefficient (ICC) between the original data, and this newly coded data, was 0.968 [0.955, 0.978], 0.963 [0.938, 0.977], 0.936 [0.911, 0.954], 0.969 [0.946, 0.982], 0.969 [0.943, 0.982], for Experiments 1 through 5, respectively.

Supplemental Results

Comparing Experiments 4-5 with Skerry et al. (2013)

We compared the results of Experiment 4 and 5 against those from Skerry et al’s Experiment 3, wherein infants received no mittens training and viewed a person reaching with a mittened hand. The results of Experiment 5 (no mitten) differed from those of the earlier experiment (mitten), [0.047,0.547], ß=0.539, B=0.297, SE=0.124 p=0.022, two-tailed, mixed effects model with fixed interaction between experiment and test event and random intercept for participants, one influential participant excluded on the basis of Cook’s Distance. In addition, the results from Experiment 4 (mitten) marginally differed from those in of Skerry et al. (mitten), [-0.021,0.47], ß=0.43, B=0.224, SE=0.122 p=0.074, two-tailed, mixed effects model with fixed interaction between experiment and test events and random intercept for participants, 2 influential participants excluded on the basis of Cook’s Distance.

Meta-analytic results

To assess the unique effects of our experimental manipulations in Experiments 1-5 and in Skerry et al. (28), we performed an analysis over these two papers (total N=264, 12 conditions). Our analytic approach allows us to assess the independent effects of 5 manipulations: the type of or absence of motor training, the presence or absence of barrier preventing a direct reach for the object during habituation, the nature of the goal (to change the state of an object or pick it up), the presence or absence of action on contact, and the presence of absence of mittens on the actor. The analysis also allows us to control for the participant variables age and sex, and model the nested structure of the data (e.g. looks clustered within experiments and within papers). For ease of interpretation, we used average proportion looking to the inefficient action in this analysis, following Skerry et al. (28)).

This analysis confirmed the findings from the individual experiments reported in the main text and in Skerry et al. (28): Infants’ expectations were stronger when the observed action was spatiotemporally continuous with its effect (i.e., appeared to be causal), [0.027,0.06], ß=0.501, B=0.044, SE=0.009 p<.001, two-tailed, when infants received effective motor training (sticky mittens), relative to no training [0.027,0.069], ß=0.558, B=0.049, SE=0.011 p=0.003, two-tailed, when the observed agent’s actions were constrained by a barrier and were efficiently adapted to that barrier, relative to the same actions that were unconstrained by a barrier, [0.021,0.051], ß=0.407, B=0.036, SE=0.008 p=0.001, two-tailed, and when the agent pursued a state change goal, relative to a pickup goal, [0.02,0.053], ß=0.397, B=0.035, SE=0.01 p=0.02, two-tailed. We also found that infants’ expectations were marginally negatively affected when they received ineffective motor training (non-sticky mittens), relative to no training, [-0.06,-0.005], ß=-0.354, B=-0.031, SE=0.015 p=0.068, two-tailed, and were unaffected when the actor wore a mitten, relative to no mitten [-0.045,0], ß=-0.232, B=-0.021, SE=0.012 p=0.14, two-tailed, as reported in the main text. These findings provide further evidence that action experience alters action interpretation, for good or for ill, but so does causal information and information about efficiency.

Figure S1. Looking time in seconds towards the efficient versus inefficient reach (bottom), and proportion looking towards the inefficient reach (top) at test across Experiments 1-5 (n=152) and across Experiments 1-5 in Skerry et al. (SCS)32 (n=112). Labels above each panel list the experiment name (Exp. 1-5, SCS Exp. 1-5), type of motor training (none, ineffective non-sticky mittens, or effective sticky mittens), whether actions during habituation were constrained or unconstrained by a barrier, goal (state.change or pick.up), whether actions resulted in contact with the object, whether the actor wore a mitten, and video displays listed in Figure 1. Error bars around means indicate within-subjects 95% confidence intervals (bottom) and bootstrapped 95% confidence intervals (top). Individual points (top) or pairs of connected points (bottom) indicate data from a single participant. Horizontal bars within boxes indicate medians, and boxes indicate the middle 2 quartiles of data. Violin plots (top) indicate distribution of data, area scaled proportionally to the number of observations.

Figure S2. Effect plots for model investigating predictors of sensitivity to action efficiency across Experiments 1-5 and Skerry et al. (2013) 32 (total N=264, 247 included in final analysis, 17 excluded on the basis of Cook’s Distance). Each point shows estimates of effects at each level of all categorical predictors: Type of motor training (none, ineffective non-sticky mittens, or effective sticky mittens), the goal of the actor (state change vs pick up), action during habituation (constrained or unconstrained by a barrier), whether actions resulted in contact with the object (yes or no), whether the actor wore a mitten (yes or no). Error bars indicate 95% confidence intervals. See Table S1 for full results.



Table S1. Regression table for model investigating predictors of sensitivity to action efficiency across Experiment 1-5 and all experiments from Skerry et al. (total N=264, 247 included in final analysis, 17 excluded on the basis of Cook’s Distance). Dependent measure is proportion looking towards the inefficient reach, averaged across 3 test trials during test. Categorical predictors were coded using sum contrasts, and fixed effects from the model should therefore be interpreted with respect to the grand mean (with respect to 0). Model formula: prop.ineff.all ~ training + goal + hab + causal + mitten + (1|experiment) + (1|ageday) + (1|sex) + (1|paper).

Standardized Estimate (ß) Estimate (B) Standard Error (SE) df t p 95% CI (Lower) 95% CI (Upper)
(Intercept) -0.340 0.488 0.019 2.19 25.28 0.001 0.457 0.523
effective training 0.558 0.049 0.011 7.32 4.31 0.003 0.027 0.069
ineffective training -0.354 -0.031 0.015 8.70 -2.08 0.068 -0.060 -0.005
state change goal 0.397 0.035 0.010 4.24 3.66 0.020 0.020 0.053
constrained habituation 0.407 0.036 0.008 9.61 4.54 0.001 0.021 0.051
causally effective 0.501 0.044 0.009 20.54 5.08 0.000 0.027 0.060
mitten -0.232 -0.021 0.012 7.39 -1.65 0.140 -0.045 0.000

This analysis confirmed that first-person action experience is not the only way to enhance infants’ appreciation of the causal and intentional aspects of action. It also confirmed the findings from the individual experiments reported in the main text and from Skerry et al. (2013): Infants’ expectations were stronger when the observed action was spatiotemporally continuous with its effect (i.e., appeared to be causal),[0.027,0.06], ß=0.501, B=0.044, SE=0.009 p<.001, two-tailed, when infants received effective motor training (sticky mittens), relative to no training, [0.027,0.069], ß=0.558, B=0.049, SE=0.011 p=0.003, two-tailed, when the observed agent’s actions were constrained by a barrier and were efficiently adapted to that barrier, relative to the same actions that were unconstrained by a barrier, [0.021,0.051], ß=0.407, B=0.036, SE=0.008 p=0.001, two-tailed, and when the agent pursued a state change goal, relative to a pickup goal, [0.02,0.053], ß=0.397, B=0.035, SE=0.01 p=0.02, two-tailed. We also found that infants’ expectations were marginally negatively affected when they received ineffective motor training (non-sticky mittens), relative to no training, [-0.06,-0.005], ß=-0.354, B=-0.031, SE=0.015 p=0.068, two-tailed, and were unaffected when the actor wore a mitten, relative to no mitten [-0.045,0], ß=-0.232, B=-0.021, SE=0.012 p=0.14, two-tailed, as reported in the main text.

Exclusion info

Table S2. Tally of infants who participated in Experiments 1-5 but were excluded in our final sample. These exclusion criteria were set prior to the start of data collection for each experiment, but vary slightly across experiments (e.g. we relaxed our definition of inattentiveness from excluding all data from a participant if they missed a test trial in Experiment 1, to excluding data from just that trial in Experiments 2-5).

Experiment Fussiness Inattentiveness Caregiver Interference Experimenter/Coding Error Technical Failure Total
Exp.1 9 5 1 12 3 30
Exp.2 0 0 0 2 0 2
Exp.3 6 0 0 2 0 8
Exp.4 7 0 0 2 0 7
Exp.5 6 0 0 1 2 9
Total 28 5 1 19 5 50

Distribution of Looks

Figure S3. Density plot of looking times during test across Experiments 1-5, and Experiments 1-5 from Skerry et al. (2013) (N=264). Maximum-likelihood fitting revealed that the lognormal distribution (log likelihood=-1720.509) provides a better fit to these data than the normal distribution (log likelihood=-1842.196).


Attention during habituation across Exp 1-5

Figure S4. Total looking time in seconds during habituation across Experiment 1-5. Error bars around means indicate bootstrapped 95% confidence intervals (CIs). Individual points indicate data from a single participant. Horizontal bars within boxes indicate medians, and boxes indicate the middle 2 quartiles of data. Violin plots in indicate distribution of data, area scaled proportionally to the number of observations.


Figure S5. Looking time in seconds during each habituation trial across Experiments 1-5. Curves with 95% confidence interval ribbons indicate smoothed conditional means, generated using the loess method. Connected points indicate data from a single participant. Labels above each panel list the experiment name (Exp. 1-5), whether actions during habituation were constrained or unconstrained by a barrier, goal (state.change or pick.up), whether actions resulted in contact with the object, whether the actor wore a mitten, and video displays listed in Figure 1.


Table S3 Regression table for mixed effects model analyzing the effect of age, sex, order of test events, habituation condition, goal, mitten, and causal information on total attention during habituation, controlling for variations across Experiments 1-5. Model formula: total_hab ~ ageday + sex + first.test + hab + goal + mitten + causal + (1|experiment)

Standardized Estimate (ß) Estimate (B) Standard Error (SE) df t p 95% CI (Lower) 95% CI (Upper)
(Intercept) -0.208 343.171 76.54 151.82 4.483 0.000 192.192 494.168
Age in Days -0.233 -2.058 0.68 147.57 -3.026 0.003 -3.400 -0.714
Sex 0.066 5.203 6.11 148.69 0.852 0.396 -6.916 17.274
First Test Event -0.006 -0.439 6.00 146.69 -0.073 0.942 -12.270 11.393
Habituation 0.222 17.590 11.03 131.60 1.595 0.113 -6.441 41.220
Goal 0.007 0.589 16.02 6.18 0.037 0.972 -0.777 50.348
Mitten 0.126 9.996 19.02 5.73 0.525 0.619 8.336 50.922
Causal -0.055 -4.379 9.08 75.38 -0.482 0.631 -5.024 3.212

To ask whether infants’ total attention during habituation was affected by experimental manipulations across Experiment 1-5 (action constrained vs unconstrained by a barrier, state change vs pickup goal, mitten vs no mitten on actor, and action with vs without contact with the object), and varied by gender and age, we fit a mixed effects model on these fixed effects and experiment (Exp.1-5) as a random intercept. We found that the only robust predictor of attention during habituation was age, [-3.4,-0.714], ß=-0.233, B=-2.058, SE=0.68 p=0.003, two-tailed, such that older infants looked for a shorter time overall than younger infants.